ggml-cuda: add mem check for fusion by am17an · Pull Request #19916 · ggml-org/llama.cpp

am17an · 2026-02-26T05:53:57Z

Fixes #19659

ORippler · 2026-03-02T12:27:10Z

ggml/src/ggml-cuda/topk-moe.cu

+    // Sanitize NaN to -FLT_MAX so the iterative argmax produces unique expert IDs.
+    // NaN comparisons always return false, which would cause the same expert to be
+    // selected repeatedly. -FLT_MAX compares normally and is still excluded by the
+    // -INFINITY sentinel used after each selection round.
+    // More relevant for the cuBLAS path. See https://github.com/ggml-org/llama.cpp/issues/19659
+#pragma unroll
+    for (int i = 0; i < experts_per_thread; i++) {
+        if (__isnanf(wt[i])) {
+            wt[i] = -FLT_MAX;
+        }
+    }


If the issue is in llama.cpp and not cuBLAS, I feel we should use fmaxf as a NaN-safe comparator: https://docs.nvidia.com/cuda/cuda-math-api/cuda_math_api/group__CUDA__MATH__SINGLE.html#_CPPv45fmaxfff (I presume we are talking about val_s > max_val_s later on in this kernel?)

If the issue is in cuBLAS, I'd love more details so I can ask the cuBLAS team/take a look myself

Yes but it's not just val_s > max_val_s it's val_s > max_val_s || (val_s == max_val_s && expert < max_expert)

The linked issue has a repro. It's cuBLAS + Nemotron, so think it would be fun for you guys to look at :)

Yes but it's not just val_s > max_val_s it's val_s > max_val_s || (val_s == max_val_s && expert < max_expert)

Shouldn't we be fine with fmaxf, so long as max_val & max_val_s are initialized to -FLT_MAX instead of -INFINITY at the beginning of the selection-loop over n_expert_used? At least for the case where k non-NAN values exist inside the logits for a given row. But at this point we are just pulling your proposal into the loop itself 😄

am17an · 2026-03-05T04:35:37Z

@JohannesGaessler can you review this PR? Apart from the NaN check it also fixes a latent bug

JohannesGaessler · 2026-03-06T14:05:55Z

ggml/src/ggml-cuda/ggml-cuda.cu

+        if ((b_start <= a_start && a_start < b_end) || (a_start <= b_start && b_start < a_end)) {
+            return true;
+        }
+
+        return false;


Suggested change

if ((b_start <= a_start && a_start < b_end) || (a_start <= b_start && b_start < a_end)) {

return true;

}

return false;

return (b_start <= a_start && a_start < b_end) || (a_start <= b_start && b_start < a_end);

This would maybe be slightly simpler but either way is fine I think.

ggml/src/ggml-cuda/ggml-cuda.cu

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

* ggml-cuda: add mem check for fusion * Replace NaNs with -FLT_MAX * fix typo Co-authored-by: Johannes Gäßler <johannesg@5d6.de> --------- Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

ggml-cuda: add mem check for fusion

5a09b48

github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Feb 26, 2026

am17an mentioned this pull request Feb 26, 2026

Eval bug: [CUDA, cuBLAS] Corrupted output on CUBLAS with moe models like Nemotron-3-nano and gpt-oss-120b with long context preprocessing #19659

Closed

Replace NaNs with -FLT_MAX

942f456

am17an marked this pull request as ready for review February 28, 2026 05:47

am17an requested a review from JohannesGaessler February 28, 2026 05:47

ORippler reviewed Mar 2, 2026

View reviewed changes

JohannesGaessler approved these changes Mar 6, 2026

View reviewed changes

fix typo

e703967

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

am17an merged commit d48e876 into ggml-org:master Mar 6, 2026
73 of 75 checks passed

am17an deleted the cuda_add_memcheck branch March 6, 2026 16:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ggml-cuda: add mem check for fusion#19916

ggml-cuda: add mem check for fusion#19916
am17an merged 3 commits intoggml-org:masterfrom
am17an:cuda_add_memcheck

am17an commented Feb 26, 2026 •

edited

Loading

Uh oh!

ORippler Mar 2, 2026 •

edited

Loading

Uh oh!

am17an Mar 2, 2026

Uh oh!

ORippler Mar 2, 2026 •

edited

Loading

Uh oh!

am17an commented Mar 5, 2026

Uh oh!

JohannesGaessler Mar 6, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

am17an commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ORippler Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

am17an Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

ORippler Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

am17an commented Mar 5, 2026

Uh oh!

JohannesGaessler Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

am17an commented Feb 26, 2026 •

edited

Loading

ORippler Mar 2, 2026 •

edited

Loading

ORippler Mar 2, 2026 •

edited

Loading